Speech Recognition on English-Mandarin Code-Switching Data using Factored Language Models - with Part-of-Speech Tags, Language ID and Code-Switch Point Probability as Factors pdfsubject=Multilingual Speech Recognition
نویسندگان
چکیده
Code-switching is defined as ”the alternate use of two or more languages in the same utterance or conversation” [1]. CS is a wide-spread phenomenon in multilingual communities, where multiple languages are concurrently used in a conversation. For automatic speech recognition (ASR), particularly intra-sentential code-switching poses an interesting challenge due to the multilingual context for language modeling and the acoustic model. The statistical estimation of n-grams at code-switch points (CSP) is poor for common, word-based n-gram language models. Our work investigates the application of additional features for ASR on MandarinEnglish speech data containing intra-sentential CS. First we explore the use of various features for multilingual language modeling by predicting language and CSPs. We investigate the text based prediction of the language of a word, utilizing language, part-of-speech (POS) tags and occurrence features of previous words in the current utterance. The results of our language prediction experiments are used to predict CSPs. We can reach an improvement compared to random predictions of CSPs. The results of our experiments to predict language and CSPs indicate, that it is useful to add language and POS tag features in the multilingual language model applied on CS data. Consequently, we include language identification (LID), POS tags, and a CSP probability class into the language model. We utilize factored language models to incorporate these features. N-best rescoring is applied to use factored language models for speech recognition. We obtain the highest performance utilizing factored language models with LID, POS tags and words as features. We evaluate our results utilizing a mixed error rate (MER). The MER is defined as the character error rate (CER) for Mandarin and the word error rate (WER) for English. Using our best performing factored language model, we improve the MER performance from 59.1% to 58.4% on the development set and from 60.8% to 59.8% on the evaluation set.
منابع مشابه
Features for factored language models for code-Switching speech
This paper presents investigations of features which can be used to predict Code-Switching speech. For this task, factored language models are applied and implemented into a state-of-the-art decoder. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and open class word clusters are explored. We find that Brown word clusters, part-of-speech tag...
متن کاملSEAME: a Mandarin-English code-switching speech corpus in south-east asia
In Singapore and Malaysia, people often speak a mixture of Mandarin and English within a single sentence. We call such sentences intra-sentential code-switch sentences. In this paper, we report on the development of a Mandarin-English codeswitching spontaneous speech corpus: SEAME. The corpus is developed as part of a multilingual speech recognition project and will be used to examine how Manda...
متن کاملA Mandarin-English Code-Switching Corpus
Generally the existing monolingual corpora are not suitable for large vocabulary continuous speech recognition (LVCSR) of codeswitching speech. The motivation of this paper is to study the rules and constraints code-switching follows and design a corpus for code-switching LVCSR task. This paper presents the development of a Mandarin-English code-switching corpus. This corpus consists of four pa...
متن کاملAn Investigation of Code-Switching Attitude Dependent Language Modeling
In this paper, we investigate the adaptation of language modeling for conversational Mandarin-English Code-Switching (CS) speech and its effect on speech recognition performance. First, we investigate the prediction of code switches based on textual features with focus on Partof-Speech (POS) tags. We show that the switching attitude is speaker dependent and utilize this finding to cluster the t...
متن کاملCrowdsourcing Universal Part-of-Speech Tags for Code-Switching
Code-switching is the phenomenon by which bilingual speakers switch between multiple languages during communication. The importance of developing language technologies for codeswitching data is immense, given the large populations that routinely code-switch. High-quality linguistic annotations are extremely valuable for any NLP task, and performance is often limited by the amount of high-qualit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011